Address ernie-image review findings #13577 by akshan-main · Pull Request #13663 · huggingface/diffusers

akshan-main · 2026-04-30T14:09:01Z

What does this PR do?

Partial fix for #13577. Addresses 1, 2, 5 per @yiyixuxu's scope

(1) Switch ErnieImageAutoPromptEnhancerStep to ConditionalPipelineBlocks so use_pe=False actually skips the prompt enhancer (AutoPipelineBlocks selected on presence, not truthiness).
(2) Align modular VAE BN epsilon to the standard pipeline's hardcoded 1e-5 (matches training; the hub config currently reports 1e-4).
(5) Restructure output_type=\"latent\" so it runs maybe_free_model_hooks() and honors return_dict, matching the QwenImage/Flux2 pattern.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue? ernie-image model/pipeline review #13577
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Who can review?

@yiyixuxu

akshan-main · 2026-04-30T14:12:06Z

https://colab.research.google.com/gist/akshan-main/c6e677d61a865593c1d98aa71aaf2afd/ernie_image_review_fixes_test.ipynb

tested here

asomoza · 2026-04-30T15:21:51Z

can we also take the opportunity to change the auto classes for the real classes, it gets confusing for some users that pass the text encoder e.g. quantization and also kind of annoying to get the warning all the time.

akshan-main · 2026-04-30T16:29:04Z

@asomoza switched text_encoder to Mistral3Model and pe to Ministral3ForCausalLM in both the standard and modular pipelines. Left the tokenizers as AutoTokenizer since mistral doesn't have a model-specific tokenizer class

hlky · 2026-04-30T18:50:42Z

+            bn_mean = self.vae.bn.running_mean.view(1, -1, 1, 1).to(device)
+            bn_std = torch.sqrt(self.vae.bn.running_var.view(1, -1, 1, 1) + 1e-5).to(device)


dtype casting to be safe and for consistency with modular

Suggested change

bn_mean = self.vae.bn.running_mean.view(1, -1, 1, 1).to(device)

bn_std = torch.sqrt(self.vae.bn.running_var.view(1, -1, 1, 1) + 1e-5).to(device)

bn_mean = self.vae.bn.running_mean.view(1, -1, 1, 1).to(device=device, dtype=latents.dtype)

bn_std = torch.sqrt(self.vae.bn.running_var.view(1, -1, 1, 1) + 1e-5).to(device=device, dtype=latents.dtype)

There could be a TODO regarding vae.config.batch_norm_eps, it should be used in the future if the checkpoint config is changed

hlky · 2026-04-30T18:54:06Z

+            images = (images.clamp(-1, 1) + 1) / 2
+            images = images.cpu().permute(0, 2, 3, 1).float().numpy()

-        if output_type == "pil":
-            images = [Image.fromarray((img * 255).astype("uint8")) for img in images]
+            if output_type == "pil":
+                images = [Image.fromarray((img * 255).astype("uint8")) for img in images]


Can VaeImageProcessor be used here? cc @yiyixuxu Enforcing VaeImageProcessor could be another agent review rule?

Switched both standard and modular to VaeImageProcessor.postprocess. Also fixes output_type="pt" in the standard pipeline (was returning numpy).

Address ernie-image review findings huggingface#13577

6e61370

github-actions Bot added modular-pipelines pipelines size/M PR with diff < 200 LOC labels Apr 30, 2026

asomoza reviewed Apr 30, 2026

View reviewed changes

Use concrete Mistral3Model / Ministral3ForCausalLM types

2b297bf

github-actions Bot added size/M PR with diff < 200 LOC and removed size/M PR with diff < 200 LOC labels Apr 30, 2026

akshan-main requested a review from asomoza April 30, 2026 16:29

hlky reviewed Apr 30, 2026

View reviewed changes

Cast bn_mean/bn_std to latents dtype + add TODO for hub eps

1176735

github-actions Bot added size/M PR with diff < 200 LOC and removed size/M PR with diff < 200 LOC labels Apr 30, 2026

Use VaeImageProcessor.postprocess in standard and modular ernie

26d8bc0

github-actions Bot added size/M PR with diff < 200 LOC and removed size/M PR with diff < 200 LOC labels Apr 30, 2026

akshan-main requested a review from hlky April 30, 2026 19:37

hlky mentioned this pull request May 1, 2026

fix(ddpm): use _execution_device, validate inputs, free hooks (#13649) #13671

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Address ernie-image review findings #13577#13663

Address ernie-image review findings #13577#13663
akshan-main wants to merge 4 commits intohuggingface:mainfrom
akshan-main:ernie-image-review-fixes

akshan-main commented Apr 30, 2026

Uh oh!

akshan-main commented Apr 30, 2026

Uh oh!

asomoza Apr 30, 2026

Uh oh!

akshan-main commented Apr 30, 2026

Uh oh!

hlky Apr 30, 2026

Uh oh!

akshan-main Apr 30, 2026

Uh oh!

hlky Apr 30, 2026

Uh oh!

akshan-main Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		bn_mean = self.vae.bn.running_mean.view(1, -1, 1, 1).to(device)
		bn_std = torch.sqrt(self.vae.bn.running_var.view(1, -1, 1, 1) + 1e-5).to(device)

Conversation

akshan-main commented Apr 30, 2026

What does this PR do?

Before submitting

Who can review?

Uh oh!

akshan-main commented Apr 30, 2026

Uh oh!

asomoza Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

akshan-main commented Apr 30, 2026

Uh oh!

hlky Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

akshan-main Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

hlky Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

akshan-main Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants